Kenneth Tay
Oct 1, 2019
http://web.stanford.edu/~kjytay/courses/stats32-aut2019/
classes <- list(quarter = "Fall 2018/19",
ID = c("STATS 32", "STATS 101", "STATS 200"),
credits = 12)
classes$ID
## [1] "STATS 32" "STATS 101" "STATS 200"
## [1] 12
A special type of list:
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
str
, summary
head
, tail
names
, dim
, nrow
, ncol
table
mean
, median
, sd
, var
factor
ggplot2
(and the +
syntax)“The simple graph has brought more information to the data analyst’s mind than any other device.” - John Tukey
## mpg weight cylinders
## 1 21.0 2.620 6
## 2 21.0 2.875 6
## 3 22.8 2.320 4
## 4 21.4 3.215 6
## 5 18.7 3.440 8
## 6 18.1 3.460 6
## 7 14.3 3.570 8
## 8 24.4 3.190 4
## 9 22.8 3.150 4
## 10 19.2 3.440 6
## 11 17.8 3.440 6
## 12 16.4 4.070 8
## 13 17.3 3.730 8
## 14 15.2 3.780 8
## 15 10.4 5.250 8
## 16 10.4 5.424 8
## 17 14.7 5.345 8
## 18 32.4 2.200 4
## 19 30.4 1.615 4
## 20 33.9 1.835 4
## 21 21.5 2.465 4
## 22 15.5 3.520 8
## 23 15.2 3.435 8
## 24 13.3 3.840 8
## 25 19.2 3.845 8
## 26 27.3 1.935 4
## 27 26.0 2.140 4
## 28 30.4 1.513 4
## 29 15.8 3.170 8
## 30 19.7 2.770 6
## 31 15.0 3.570 8
## 32 21.4 2.780 4
“The simple graph has brought more information to the data analyst’s mind than any other device.” - John Tukey
## mpg weight cylinders
## 1 21.0 2.620 6
## 2 21.0 2.875 6
## 3 22.8 2.320 4
## 4 21.4 3.215 6
## 5 18.7 3.440 8
## 6 18.1 3.460 6
## 7 14.3 3.570 8
## 8 24.4 3.190 4
## 9 22.8 3.150 4
## 10 19.2 3.440 6
## 11 17.8 3.440 6
## 12 16.4 4.070 8
## 13 17.3 3.730 8
## 14 15.2 3.780 8
## 15 10.4 5.250 8
## 16 10.4 5.424 8
## 17 14.7 5.345 8
## 18 32.4 2.200 4
## 19 30.4 1.615 4
## 20 33.9 1.835 4
## 21 21.5 2.465 4
## 22 15.5 3.520 8
## 23 15.2 3.435 8
## 24 13.3 3.840 8
## 25 19.2 3.845 8
## 26 27.3 1.935 4
## 27 26.0 2.140 4
## 28 30.4 1.513 4
## 29 15.8 3.170 8
## 30 19.7 2.770 6
## 31 15.0 3.570 8
## 32 21.4 2.780 4
What is the distribution of cylinders in my dataset?
What is the distribution of miles per gallon
in my dataset?
What is the relationship between mpg
and weight
?
What is the relationship between mpg
and time?
Not so good…
Easier to see the trend
For each value of cylinder, what is the distribution of mpg
like?
How often does each pair of cylinder
and gear
occur in the dataset?
I have father-son pairs. For each pair, I record their height and weight, as well as their ethnicities. I want to study the relationship between characteristics of the father and that of the son. What plots could help me?
ggplot2
ggplot2
packageggplot2
reference manualData: Dataset we are using for the plot
## mpg weight cylinders
## 1 21.0 2.620 6
## 2 21.0 2.875 6
## 3 22.8 2.320 4
## 4 21.4 3.215 6
## 5 18.7 3.440 8
## 6 18.1 3.460 6
## 7 14.3 3.570 8
## 8 24.4 3.190 4
## 9 22.8 3.150 4
## 10 19.2 3.440 6
Geometries: Visual elements used for our data
Geom: point
Aesthetics: Defines the data columns which affect various aspects of the geom
3 different aesthetics:
ggplot2
codeggplot2
code
Optional material
One graphic contains:
Sometimes we need to tweak the position of the geometric elements because they obscure each other.
Only 9 data points??
Much better
Default colors
Manually chosen colors
rgb(0,0,1)
, rgb(1,0,0)
, rgb(0,0,0)
, rgb(1,1,1)